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Context, Radio frequency interference (RFI) already limits the sensitivity of existing radio telescopes in several frequency bands and 
may prove to be an even greater obstacle for future generation instruments to overcome. 

Aims. I aim to create a structure of radio astronomy correlators which will be statistically stable (robust) in the presence of interference. 
Methods. A statistical analysis of the mixture of system noise + signal noise + RFI is proposed here which could be incorporated 
into the block diagram of a correlator. Order and rank statistics are especially useful when calculated in both temporal and frequency 
domains. 

Results. Several new algorithms of robust correlators are proposed and investigated here. Computer simulations and processing of 
real data demonstrate the efficacy of the proposed algorithms. 
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1. Introduction 

Correlators are central to the signal processing system of radio 
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interferometer, (Thompson, Moran&Swenson 200 1)| Signals re- 



ceived from radio sources mixed with system noise (sky noise + 
receiver noise) can be represented as stochastic processes with 
a Gaussian (normal) distribution. Each pair of random num- 
bers at the input of the correlator is described as a bivariate 
random value (X, Y) with the distribution /\f(nx>MY,o J x ,o~^,po) 
where fix = f*Y = are the mean values, cr\ and cr\ are vari- 
ances proportional to the intensity of the input noise and p is 
the correlation coefficient (normalized visibility for a particular 
baseline). From the sequences xi,...,xy and y\, ...,yy the corre- 
lator calculates the so-called Pearson's correlation coefficient 



rp 



XUixi - X)( yi - Y) 



(1) 



where X and Y are the arithmetic averages of x and y, respec- 
tively. The value lim^oo rp = po. This is valid both for XF and 
FX correlators (the correlation of complex numbers is reduced 
to separate correlations of the real and imaginary components). 
The sample correlation coefficient rp is very sensitive to out- 
liers (samples which are not consistent with the normal distri- 
bution N([ix,HY,o~ x , o-y,po), .i.e., the estimate (1) is not statisti- 
cally robust, (Gnanadesikar j 1 9971 Huberl981). Radio frequency 
interference (RFI) can produce considerable outliers, yielding a 
bias of rp and increasing the standard deviation (rms) of rp. In 
this paper the terms "RFI" and "outliers" are interchangeable. 

Methods of robust statistics allow stable estimates to be ob- 
tained in the presence of outliers in several radio-astronomical 
applications (Fridmati 2008j). Here these methods are applied to 
correlation measurements of radio interferometers. 

There are different types of RFI (Fridman&Baati200l]l and 
they differ significantly over the radio astronomy frequency 
band. Strong and impulse-like RFI on the time-frequency plane 
are visible in practically all frequency bands of LOFAR, and 
these types of RFI are considered here because data from the 
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Fig. 1. Autospectrum of system noise + RFI from LOFAR CS1 
at the central frequency / = 217.1875MHz, bandwidth Af = 
l56.25KHz, frequency resolution 6f ~ 38//z, integration time 

fa k 



LOFAR Core Station (CS1) are used as examples later in this 
paper. Fig. 1 give an impression of the types of interferences in 
one sub-band: the auto-spectrum of the system noise + RFI from 
the LOFAR CS 1 is represented in this figure with high spectral 
and time resolution. 

The following approximate classification is proposed for the 
strong RFI visible at LOFAR: 

a) Narrow-band and persistent RFI; 

b) Narrow impulses (< \sec, < 100//z) on the time-frequency 
plane; 

c) Impulse-like RFI in the time domain (several seconds) which 
are also wide-band (5 -10 MHz); 

d) Random-form bursts on the time-frequency plane. 

There are several types of RFI which cannot be treated as out- 



2 



P. Fridman: Robust correlators 



liers: they may be persistent and wide-band, weak or strong, 
fully or partly correlated at the sites of a radio interferom- 
eter. Methods using the spatial separation of the source of 
RFI and the radio source can effectively address this issue, 
see (Ellingson&Hampso ri2003l Jeff et al 120051 Kestever l2007l 
Cornwell et al l2004b . 

Several types of robust correlators that are able to mitigate 
the impulse-like, strong RFI in both temporal and frequency do- 
mains have been studied in this paper. They are used in applica- 
tions where input data are contaminated with outliers, and these 
correlators are statistically more stable than correlator (1). Some 
of them could be used in radio astronomy, especially in radio 
interferometric systems with software correlators where the cal- 
culation of visibilities is carried out on general-purpose comput- 
ers, as in (LOFAR E0091 and JIVE, Kruithof2009). Software cor- 
relators are, by definition, much more flexible than traditional 
hardware correlators: any algorithm adapted or modified for a 
particular observation can be optionally downloaded as a sub- 
routine. 

There are two operations in the numerator of (1): multiplication 
of the input samples of X and Y and summation (averaging). 
Here it is proposed that they be modified to make the correlator 
more robust. 

The new features can be introduced in the first operation to an- 
alyze the statistics of X and Y and to introduce a type of editing 
in order to eliminate outliers. 

The second operation, summation, which is considered as post- 
correlation averaging, can be divided into three steps: primary 
averaging over a time interval which is not too long to smooth 
RFI bursts appearing at this stage above the noise level, 
RFI mitigation 

and secondary averaging to the required time interval depending 
on the observational specifications (wavelength, baseline, radio 
source properties), see end of section 3 and Figs. 13 - 15. 

Different types of correlators described in the following sec- 
tion are compared with Pearson's correlator (1) using two crite- 
ria: 

1. The bias of the estimate p produced by RFI compared to the 
input correlation coefficient po; 

2. The effectiveness of an estimator is judged by the rms at the 
output of the correlator in both the presence and in the absence 
of outliers. 

Computer simulations were performed to estimate these val- 
ues, of the bias and rms. Also some results of the processing of 
CS 1 data will be shown. 



2. Estimators of the correlation coefficient 

There are two classic estimators of correlation coefficients 
using ranks of samples instead of the samples themselves, 
(Kendal fj970l l. 

Let two input sample sequences X\, ...x n andyi, ...y„ be sorted in 
ascending order: X(\) < xp) < ... < X(„) andy(i) < yp> < ... < y(«). 
The ith value x^ is called ith-order statistic. Each sample xj 
has its kth position in the sorted series x^, ...X( n y This num- 
ber k is the rank of the sample xj and is denoted by pj(- k). 
Similarly, the rank of yj is denoted by qj. Let (x,,y,) and (Xj,yf) 
with i = 1, ...n and j = i + 1, ...n be two data pairs from the origi- 
nal data sequences. If Pj-pi and qj-q, have the same sign, these 
two data pairs are said to be concordant, otherwise, they are dis- 
cordant. Let n c be the number of concordant pairs and nj the 
number of discordant pairs. It is clear that n c + nj — n(n - l)/2. 



2.1. Spearman's correlator 

The Spearman's correlation coefficient is calculated by 

6e;' =1 (p«-*) 2 



r SP = \- 



n(n 2 — 1) 



(2) 



The bivariate correlation coefficient corresponding to Pearson's 
rp can be restored using the relationship: 



p = 2 sin( l-7T r SP ). 
o 



2.2. Kendall's correlator 

Kendall's correlation coefficient is calculated by 

2(n c - n d ) 



(3) 



V KND 



n(n — 1) 



(4) 



The bivariate correlation coefficient corresponding to Pearson's 
rp can be restored using the relationship: 



p = sm(^nr KND ). 



(5) 



2.3. Correlator using sums and differences 

One of the first constructions of a robust correlator is based 
on the quarter square identity (Gnanadesikan&Kettenrins l 9721 
Huberl9£fl!: 



cov(X, Y) = l/4[var(X + Y) - var(X - Y)], 



(6) 



where cov denotes covariance and var denotes variance. The cor- 
relation coefficient can be obtained with 



r G K = 1/4 



var(X + Y) - var(X - Y) 
yjvar(X)var(Y) 



(7) 



Therefore, any robust estimators of variance can be used for 
this correlator (Fridmaii2008). Several of them are applied here 
to (7). 



2.3.1 . Trimming 

Samples Z\ — X + Y and Z2 = X - Y are sorted in ascending 
order: Z(i), ...Z(«). Let y denote the chosen amount of trimming, 
< y < 0.5 and k = [yn]. Sample-trimmed variance is computed 
by removing k of the largest and k of the smallest data and using 
the values that remain: 



n-k 



var tr . 



n -2k 



(8) 



^ n-k 

n-2k t—t 



where p t rim is the sample mean of the trimmed data. Trimming 
lessens the variance of data and the coefficient K lrun makes 
var tr i m a consistent estimator for data with a normal distribution. 
The value of K, r i m depends on 7, (Fridmaii2008), but in the con- 
text of equation (7), this is not important, because K trim appears 
symmetrically in the numerator and denominator of (7). 
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2.3.2. Winsorization 

A sample Z is sorted in ascending order. For the chosen < y < 
0.5 and k = [yn], Winsorization of the sorted data consists of 
setting 

[ Z(k+\), if Z(i) < Z(k+i) 

Wi = j Z(i), if Z(k+l) < Z(i) < Z(n-k) (9) 

[ Z(„-k), if Z(i) > Z(„-k) 

The Winsorized sample mean is JJ^, = - £" =1 W, and the 
Winsorized sample variance is 

1 " 

var wins = Y(Wt - K-) 1 - (10) 

n - 1 

/=i 

The essence of Winsorization is to replace the k of the lowest 
and k of the highest samples of the sorted data z by the values of 
their nearest neighbors Z(k+i) and Z(n-k)> respectively. 

2.3.3. Median Absolute Deviation (MAD) 

This estimate of the variance of sorted data Z(iy < Z(2) ^ ■■• ^ Z(n) 
is defined by 

var med = (1 .483 x med\<i< n {\zi - med(zi)\}) 2 , (1 1) 

where 

med = 0.5(z (m ) + Z( m+ i)), n = 2m, 
med = Z(m+i), n = 2m + 1. 

Using med(zi) in the place of sample mean and 
(med\zi - med(zd\) 2 in the place of sample variance pro- 
vides a more robust estimate of variance: only central samples 
of sorted data (central order statistics) are used, according to 
the definition of the median, and outliers are excluded. This 
procedure works well with data contaminated by up to w 50% 
outliers , but the rms of the estimator (1 1) is larger than that of 
the classic variance estimate (see section 3, Table 4). 

2.3.4. Qn estimate 

This estimate is proposed in |(Rousseuw&Croux 1993)| and is 
moderately effective. It combines the ideas of the Hodges- 
Lehmann estimate and the Gini estimate, ( Kendal&St uart 1967b : 
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Fig. 2. Input signals representing Gaussian noise + impulse-like 
outliers, the number of samples n - 10000. 

3. Testing the algorithms 

3.1. Mitigation of outliers in the temporal domain 

The two input signals X and Y are modeled by bivariate random 
numbers with normal distribution, zero mean, standard deviation 
cr = 1.0 and correlation coefficient p = 0.0; 0.05; 0.1; 0.2. Such 
p are typical of many radio interferometric observations when 
weak signals are processed. The number of samples in each data 
sequence is n and the number of repetitions of the correlation is 
m. Outliers r/7, added to the inputs x, and y, are modeled by the 
following expression: 

rfU = si x sign(zi) x (A rfi + <T rfi x it,-), (13) 

where s,- is the Poisson random value with the parameter A = 
0.01, A r fj = 20.0 is the amplitude of outliers randomized by 
adding the normal random numbers m,- with the standard devi- 
ation cr r fi = 0.3A r fi. The random polarities of the outliers are 
provided by the sign(zd, Zi is the other auxiliary random normal 
number. Fig. 2 gives the example of a sequence of input samples. 

The results of computer simulations, bias and rms of the es- 
timate of p are presented in Tables 1-5. The bias of the estimate 
p is p : bias - ~p - p. The confidence intervals are calculated 
for the 95% -confidence level and for rms-p = 1/ yfmn for p and 

rms^ — 1 / V2mn for rms (the approximations that are valid for 
small p and large mri). 

Looking at Tables 1-5 several remarks can be made. 



var Qn = 2.108{|xi - xj\ , i < (12) 

where k = and h = [n/2] + 1, that is k ~ This 
estimator works as follows: the interpoint (pairwise) distances 

— Xj\ , i < j, are sorted in ascending order. The kth value of 
this sorted sequence (the kth order statistics) multipled by con- 
sistency factor 2.108 is then taken as the estimate of variance. 
The value of the consistency factor is also not critical, as for 
trimming (section 2.3.1). 

Other robust algorithms can also be applied (for example, 
the M-estimator, (Huberl981) but here I was not attempting to 
make a comprehensive study of all types of robust correlators. 
Rather I wished to direct attention to the options hitherto unused 
in designing correlators. 



1. In the absence of outliers, robust correlators reproduce 
practically the same values of p as the Pearson's correlator (not 
showing a significant bias) and keeping the effectiveness close to 
1/ ^Jmn (except MAD). Fig. 3 shows this for correlator (7) with 
Winsorization. 

2. The output of the Pearson's correlator is dramaticaly decorre- 
lated with the growth of the outlier's amplitudes whereas robust 
correlators analyze local statistics (« samples) and eliminate out- 
liers regardless of their amplitudes, see Fig. 4. 

3. In the presence of outliers, robust correlators show a small 
positive bias which is equal to a; 1 .0% of p. 

4. Spearman's and Kendall's correlators (Table. 1) provide a con- 
siderable reduction of bias in the presence of RFI compared to 
the reduction of bias in Pearson's correlator and keep the rms, 
i.e., there is no loss in effectiveness within the limits of error. 
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Table 1. Bias and rms of p for the Spearman and Kendall correlators, n = 1000, m = 1000, 95%-confidence intervals 
for bias: 6 bias = ±0.00205, for rms: 6 rms = ±0.0014. 



Without outliers 



Pearson Spearman Kendall 



p 


Bias Rms 


Bias Rms 


Bias Rms 


0.00 


0.000013 0.03265 


0.000101 0.03425 


0.000100 0.03428 


0.05 


-0.000672 0.03245 


-0.000436 0.03391 


-0.000366 0.03396 


0.10 


0.000075 0.03268 


0.000431 0.03454 


0.000511 0.03452 


0.20 


0.000445 0.03023 


0.000696 0.03217 


0.000817 0.03207 


Without outliers 


Pearson Spearman Kendall 


P 


Bias Rms 


Bias Rms 


Bias Rms 


0.00 


0.00156 0.03075 


0.000416 0.03453 


0.000419 0.03455 


0.05 


-0.04115 0.03026 


-0.000179 0.03399 


-0.000131 0.03405 


0.10 


-0.07844 0.03212 


-0.003240 0.03393 


-0.003159 0.03392 


0.20 


-0.16020 0.03144 


-0.007427 0.03233 


-0.007243 0.03222 



Table 2. Bias and rms of p for trimming, y = 0.1, n = 1000, m = 10000, 95%-confidence intervals 
for bias: 6 hias = ±0.00062, for rms: 6 rms = ±0.00044. 



Without outliers With outliers 



Pearson Trimming Pearson Trimming 



p 


Bias Rms 


Bias Rms 


Bias Rms 


Bias Rms 


0.00 


-0.000176 0.031209 


-0.000171 0.041631 


-0.000048 0.030746 


-0.000193 0.042383 


0.05 


-0.000398 0.031813 


-0.000975 0.042481 


-0.039886 0.031722 


0.000503 0.043066 


0.10 


0.000326 0.031324 


0.000575 0.041588 


-0.079740 0.031434 


-0.000943 0.042246 


0.20 


-0.000556 0.030111 


0.000614 0.040910 


-0.160044 0.032241 


-0.002102 0.041670 



Table 3. Bias and rms of p for Winsorization, y = 0.1, n = 1000, m = 10000, 95%-confidence intervals 
for bias: 5 bias = ±0.00062, for rms: 6 rms = ±0.00044. 



Without outliers With outliers 



Pearson Winsorization Pearson Winsorization 



p 


Bias Rms 


Bias Rms 


Bias Rms 


Bias Rms 


0.00 


0.000032 0.031597 


0.000046 0.037387 


-0.000692 0.032661 


0.000086 0.038446 


0.05 


-0.000025 0.031979 


-0.000105 0.037754 


-0.040128 0.032028 


-0.000886 0.038748 


0.10 


-0.000151 0.031276 


-0.000396 0.037174 


-0.080514 0.032396 


-0.002059 0.038072 


0.20 


0.000010 0.030453 


-0.000253 0.036746 


-0.160392 0.032851 


-0.002856 0.037700 



Correlation coefficient, RFI= rfio=0.05 

0.1 i 1 1 1 1 1 1 1 
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Fig. 3. Correlation coefficients in the absence of outliers for 
Pearson's correlator (1), (upper panel), and Winsorized corre- 
lator (7), (lower panel), each of the m = 100 correlation coeffi- 
cients are calculated for n = 1000 data samples, p = 0.05. 



These correlators require more operating capacity per lag, due 
to the sorting and ranking of samples (this is also valid for the 



following algorithms). 

5. Trimming and Winsorization (Tables 2 and 3) show good re- 
sults with regard to bias and effectiveness. The rms for trim- 
ming is increased « 10% than for Winsorization. The choice 
of y = 0.1 presumes a percentage of outliers of less than 10%. 
So there is a considerable safety margin here but the growth of 
rms for y = 0.1 is insignificant. In practice, an adaptive choice 
of y is possible in each sub-band when the value of y is chosen 
after taking into account a real RFI environment situation, i. e., 
the presence or absence of RFI and its duty cycle. 

6. Median absolute deviation (MAD) (Table. 4) gives the best re- 
sults for the bias, but as predicted theoretically, it has the largest 
rms: « 1 .65 greater than 1 / ^Jmn. 

7. The Qn estimate (Table. 5) requires more operating capacity 
than the others. The bias is as little as for the other algorithms, 
the rms is slightly larger (» 10%) than 1/ ^Jmn. 

8. The proposed methods are effective in the presence of strong 
impulse-like outliers. When not intercepted, they decorrelate the 
correlator's output and it is practically impossible to improve 
this situation with subsequent processing. 
Low-amplitude outliers are more difficult to detect because they 
are similar to the rare Gaussian noise spikes, but their decorre- 
lation impact is also much weaker. For example, with an RFI 
amplitude « 3. Oct, A = 0.01 and p = 0.1 (signal-of-interest), 
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Table 4. Bias and rms of p for median absolute deviation (MAD), n = 1000, m = 10000, 95%-confidence intervals 
for bias: 6 bias = +0.00097, for rms: 6 ms = +0.00069. 



Without outliers With outliers 





Pearson 


MAD 


Pearson 


MAD 


p 


Bias 


Rms 


Bias 


Rms 


Bias 


Rms 


Bias 


Rms 


0.00 


0.000073 


0.031411 


-0.000057 


0.051588 


0.000246 


0.031372 


-0.000097 


0.052656 


0.05 


0.000073 


0.031761 


-0.000315 


0.052502 


-0.039889 


0.032401 


0.001113 


0.053639 


0.10 


0.000064 


0.031470 


-0.000396 


0.051782 


-0.080238 


0.031536 


0.002077 


0.052777 


0.20 


-0.000510 


0.030379 


-0.000158 


0.051560 


-0.080546 


0.033214 


0.002244 


0.052823 



Table 5. Bias and rms of p for 22-estimator, n = 1000, m = 1000, 95%-confidence intervals 
for bias: 6 hias = +0.00205, for rms: 6 rms = +0.0014. 



Without outliers With outliers 





Pearson 


Qn 




Pearson 


Qn 




p 


Bias 


Rms 


Bias 


Rms 


Bias 


Rms 


Bias 


Rms 


0.00 


0.000488 


0.031669 


0.000541 


0.035316 


-0.000122 


0.030729 


0.000446 


0.037311 


0.05 


0.001529 


0.031255 


0.001392 


0.033659 


-0.039906 


0.032105 


0.000570 


0.035250 


0.10 


0.000362 


0.031255 


0.000277 


0.034868 


-0.081281 


0.031905 


0.001994 


0.037097 


0.20 


-0.001444 


0.029626 


-0.001721 


0.032624 


-0.158926 


0.033120 


0.000594 


0.034818 



Correlation coefficient, RFI=20, rho=0.05 
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Fig. 4. Correlation coefficients in the presence of outliers for 
Pearson's correlator (1), (upper panel), and Winsorized corre- 
lator (7), (lower panel), A r fi = 20.0, p = 0.05. The solid line in 
the upper panel shows the total loss of correlation, the dashed 
line corresponds to the absence of outliers, while the solid and 
dashed lines in the lower panel practically coincide. 

the output of Pearson's correlator is 0.092 and the output of 
Spearman's correlator 0.098. 

9. The situation is different when outliers are correlated at the 
inputs of the correlator. In this case outliers produce an exces- 
sive, false correlation even in the absence of a signal-of-interest, 
p = 0.0. For example, for strong 100%-correlated RFI with an 
amplitude equal to 30.0<x, A = 0.001 and p = 0.0, the output 
of Pearson's correlator is 0.349 and the output of the MAD- 
correlator is 0.001. 

For correlated RFI, other methods exploiting this correlation 
property can be more effective, see (Ellingson&Hampson2003 
Jeff et al l2005l Kesteverj2007] Cornwell et al l2004l 

3.2. Mitigation of outliers in the frequency domain 

Narrow-band RFI can be detected as an impulse-like outlier in 
the frequency domain. The following example illustrates the ap- 
plication of trimming (8) in this case. RFI is generated as a 
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0" 







100 200 300 400 500 


600 700 800 900 1000 




iff 





100 200 300 400 500 600 700 800 900 1000 



Fig. 5. Power spectra of the unmodulated sinusoidal carrier (up- 
per panel) and the randomly-phased modulated carrier with the 
phase jumps ±n/2. 

phase-modulated sinusoidal carrier, see Fig. 5. The power spec- 
trum of the unmodulated carrier is shown in the upper panel and 
in the lower panel, the power spectrum of the randomly phase- 
modulated signal is represented: 

s — A r fisin(2nFi + (n/2)rect(i)), i — \...n (14) 

where rect(i) is a rectangular wave with Poisson's law dis- 
tributed random jumps from -1 to +1, and the parameter of 
Poisson distribution A = 0.01. The mixture of two input signals 
and their power spectra are represented in Fig. 6. The amplitudes 
Air/, = A2 r fi = 0.5 and frequencies F\ — 0.3 and F2 - 0.4. 
Therefore, RFI is not visible in the temporal domain but is eas- 
ily visible in the spectral domain after FFT with n = 2048 as two 
bursts. 

The statistics of the power spectra are analyzed following 
(8) and the indexes of outliers exceeding the level equal to the 
1.5%-percentile are used to reject the samples of the complex 
spectra of signals x and y, where (.) denotes the Fourier trans- 
form. These indexes are tagged with the indicator function ind(i) 
taking the numbers or 1, i = \..n and assigned to the se- 
quence of n samples of the complex spectra x and y~. The "sum- 
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Correlation coefficient rho=0.1 
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Fig. 6. Input signal x noise +RFI (upper panel) and its power 
spectra (upper middle panel), input signal y (lower middle panel) 
and its power spectra (lower panel) 
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Fig. 8. Pearson's correlator output (upper panel) and the "sum- 
difference" correlator ouput (lower panel), p = 0.1. 



0.4 
0.3 

I 02 
I 0.1 


-0.1 

0.4 

.1 03 

5 0.2 
•v 

E 0.1 

" 

-0.1 



Correlation coefficient rho=0.2 



Fig. 7. Pearson's correlator output (upper panel) and the "sum- 
difference" correlator ouput (lower panel), p = 0.0. 



Fig. 9. Pearson's correlator output (upper panel) and the "sum- 
difference" correlator ouput (lower panel), p = 0.2. 



difference" correlator (7) is used in the spectral domain (FX- 
correlator): 

var(sum) — var{' ! K{sum)) + var(3(sum)), sum — x~+~y, 

var(dif) = var(%(dif)) + var(3(dif)), dif - x-y, (15) 

Robust variance using the trimming algorithm (8) is applied 
while calculating (15): outliers are marked by the indicator func- 
tion ind(i). The results of computer simulation for n - 2048 and 
m = 100 are shown in Fig. 7, 8 and 9. Fig. 7 shows the m outputs 
of the Pearson's correlator, upper panel, and the robust correla- 
tor (15), lower panel, for the correlation coefficient of the input 
signals p = 0, i.e., for the uncorrected inputs, except RFI, which 
are 100% correlated. The upper panel shows considerable bias, 
while the fluctuations of correlator output in the lower panel vary 
around zero. Fig. 8 gives the same situation but for p — 0.1 and 
Fig. 9 - for p = 0.2. In all these examples a considerable bias 
is visible for the Pearson's correlator and there is an absence of 
bias for the "sum-difference" correlator. 

Of course, other post-correlation methods can be used in the 
case of narrow-band persistent RFI with a stable spatial orien- 
tation, for example, (Cornwell et al.2004). But in the case of 
sporadic burst-like RFI, the proposed pre-correlation statistical 
analysis is more appropriate: only n * 2000 samples were used 



for each correlator input, which corresponds to a microsecond 
time scale for typical bandwidths. 

3.2.1. Processing of observational data 

Several examples of applications of the aforementioned algo- 
rithms are presented here. 

The auto-spectrum in Fig. 10 is calculated using the "raw" data 
recorded at LOFAR CS1 for three hours. Data consisting of 
complex eight-bit numbers and having a sample time interval 
equal to 6.4 fisec were recorded on the subband with central 
frequency f = 205.78 1MHz, bandwidth A/ = l56.25KHz. 
This time-frequency presentation corresponds to 32 frequency 
channels, i.e., the frequency resolution is 6f « 4.88/Ti/z, and 
the integration time t ss 1.7s. 

Fig. 11 demonstrates the auto-spectrum calculated from the 
same data for 1024 frequency channels. The high spectral 
resolution, 5f ~ 153//z, permits the separation of RFI spikes, 
i.e., not smoothing them, as would have been in the case of 
32 channels, see Fig. 10. Only half of the 3-hour duration 
auto-spectrum is shown in Fig. 11 (due to the constraints of 
computer memory). The auto-spectrum in Fig. 1 1 is calculated 
without any censoring of outliers. 

Fig. 12 shows the "cleaned" auto-spectrum where Winsorization 
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Fig. 10. Auto-spectrum of system noise +RFI from LOFAR 
CS1 at the central frequency f = 205.7 '81 MHz, bandwidth 
A/ = \56.25KHz, frequency resolution 6f ~ A.SSKHz, inte- 
gration time t m Us. 



17Apr2008.band025 df=l56.25KHz. 1024channels 3 hours 




150 o time, sec 

frequency, KHz 



Fig. 11. Auto-spectrum of system noise +RFI from LOFAR CS 1 
at the central frequency f = 205.78 1MHz, bandwidth A/ = 
156.25KHz, frequency resolution 6f ~ 153Hz, integration time 
t*l.7s. 



17Apr2008. band025 df=156.25KHz. 32 channels, 3 hours 
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frequency, KHz 



Fig. 12. Auto-spectrum of the same data as in Fig. 1 1 calculated 
with Winsorization. 
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Fig. 13. Post-correlation cross-spectrum, central frequency /o = 
131.4453MHz, bandwidth A/ = l56.25KHz, frequency resolu- 
tion 5f - 610//z, time resolution 6t = I sec. 








Fig. 14. Post-correlation cross-spectrum from Fig. 13 with out- 
liers removed. 




Fig. 15. Averaged cross-spectra corresponding to Fig. 13 (upper 
and middle (zoomed) panels) and to Fig. 14 (lower panel). 
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This example shows the importance of a sufficiently high fre- 
quency resolution. All RFI visible in Fig. 1 1 are invisible in the 
temporal domain; they are too weak and they are "below" the 
system noise. The low level of RFI also made it necessary to av- 
erage the instantaneous power spectra obtained after each FFT 
The number of averaged spectra is 256 and thus the time resolu- 
tion in the Fig. 11 is 6.4 x 1(T 6 x 1024 x 256 = 1.677 sec. The 
ripple structure clearly visible in the auto-spectra is produced 
by the transfer functions of LOFAR digital filters separating the 
whole input bandwidth into dozens of sub-bands with the partial 
bandwidths A/ = \56.25KHz (or 200KHz). 

The censoring of outliers may also be useful in the case of 
post-correlation data produced by the LOFAR correlator. The 
filter bank of the LOFAR backend divides each of the 156KHz- 
bandwidth sub-bands into 256 narrow "sub-sub-bands" with a 
bandwidth equal to 610.3516Hz. The sample time interval after 
preliminary averaging is 1 sec. This good time-frequency reso- 
lution allows the fine-grain structure of RFI to be seen. 

The following example in Fig. 13 shows the post-correlation 
cross-spectrum on frequency /o = 131.4453MHz. 
Fig. 14 shows the corresponding Winsorized cross-spectrum 
and Fig. 15 shows the averaged cross-spectra along the whole 
time interval 4 x 10 4 sec - without Winsorization (upper and 
middle panels) and with Winsorization (lower panel). The weak 
correlated component (the signal-of-interest) is produced by 
some cross-polarization effects and bears a ripple structure 
similar to the auto-spectra. 
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4. Conclusions 

1 . Estimates of the correlation coefficient are an important part 
of both classical and of robust statistics. Statistical analysis 
of raw data with the finest available time and frequency res- 
olution can help during observations in an RFI contaminated 
environment. Growing concern about RFI pollution should 
persuade the radio astronomy community to pay more atten- 
tion to the variety of algorithms developed in the realm of 
robust statistics. This framework of robust estimates puts the 
successfully tested blanking of RFI on a more stable founda- 
tion. 

2. Statistically faithful, robust estimates of the correlation co- 
efficient are especially appropriate for application in an 
impulse-like strong RFI environment. RFI is effectively sup- 
pressed and the accompanying bias and effectiveness are tol- 
erable. The aforementioned robust algorithms can be use- 
fully applied in these particular situations. 

3. The choice of a particular algorithm depends upon the type 
and intensity of the RFI. The proportion of RFI presence in 
data is also important. The type of implementation may de- 
termine the choice: off-line or real-time. It is difficult to equip 
existing conventional hardware correlators with these tools. 
Future generations of radio telescopes (LOFAR, ATA, SKA) 
will generate such huge amounts of data that real-time pro- 
cessing is vital: DSR FPGA or supercomputers are possible 
solutions. Software correlators are especially suited to the 
implementation of robust schemes as optional subroutines. 
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